Document Clustering and Summarization Based on Association Rule Mining for Dynamic Environment
نویسندگان
چکیده
Document Summarization is a technique, which reduces the size of the documents and gives the outline and crisp information about the given group of documents. This paper introduces a new update summarization algorithm incorporating association rule mining and correlated concept based hierarchical clustering for dynamic environment. In this algorithm, the associated concepts are extracted using Rule mining technique (Generating Association Rules based on Weighting Scheme) and the Correlated concepts (terms and their related terms) are extracted based on concept extraction algorithm. Extracting concepts based on association rule, helps the user to cluster and summarize the similar concept, which in turn improves the quality of the cluster and the created summary. The performance of the hierarchical clustering based update summarization technique is compared with the existing COBWEB (update summarization) algorithm and static summarization algorithms namely; MEAD, CPLN (Centroid, Position, Length and Numerical value) and CPLNVN (Centroid, Position, Length Numerical value and VerbNoun) considering Precision, Recall and F-measure as performance metrics. Scientific literature and 20 Newsgroups are chosen as the data set for the experiment analysis. The experimental results demonstrate that the proposed algorithm exhibit better performance, compared to the existing algorithms for summarization.
منابع مشابه
Implementation of CBC Algorithm for Document Clustering and Summarization
The main objective of this paper is to provide cluster summarization of huge text document. Mining process includes the sharing of large scale amount of data from various sources, which gets concluded at the mined data. In distributed data mining, adopting a flat node distribution model can affect scalability, modularity, flexibility which are being overcome by using dynamic peer to peer docume...
متن کاملAn Efficient Hash-based Association Rule Mining Approach for Document Clustering
Document clustering is one of the important research issues in the field of text mining, where the documents are grouped without predefined categories or labels. High dimensionality is a major challenge in document clustering. Some of the recent algorithms address this problem by using frequent term sets for clustering. This paper proposes a new methodology for document clustering based on Asso...
متن کاملMining the Banking Customer Behavior Using Clustering and Association Rules Methods
The unprecedented growth of competition in the banking technology has raised the importance of retaining current customers and acquires new customers so that is important analyzing Customer behavior, which is base on bank databases. Analyzing bank databases for analyzing customer behavior is difficult since bank databases are multi-dimensional, comprised of monthly account records and daily t...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کامل